class: center, middle, inverse, title-slide .title[ # Statistical methods with applications to pairs trading and equipment life-time modeling ] .subtitle[ ## Ph.D. Defense ] .author[ ### Allan Quadros ] .date[ ### Ph.D. Candidate | Statistics Kansas State University 2025-04-04 ] --- layout: true background-image: url(./img/logo/logo2.png) background-position: 0% 100% background-size: 5% <style type="text/css"> .highlight-last-item > ul > li, .highlight-last-item > ol > li { opacity: 0.5; } .highlight-last-item > ul > li:last-of-type, .highlight-last-item > ol > li:last-of-type { opacity: 1; } </style> --- ## Table of contents <!-- <br> --> <!-- > Objectives --> > (1) General Introduction <br><br> > (2) Part I - Pairs Trading <br> > + (2.1) Introduction to Pairs Trading <br> > + (2.2) Bayesian Method <br> > + (2.3) Non-Overlapping Block Bootstrap <br> > + (2.4) Conclusion <br><br> > (3) Part II - Equipment Life-cycle Prediction <br> > + (3.1) Introduction to Reliability Theory <br> > + (3.2) Predicting Equipment Life-cycle in the Absence of Data <br> > + (3.3) Conclusion <br><br> --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Part I - Pairs Trading <!-- --- --> <!-- </br> --> <!-- </br> --> <!-- ###<font color =#4C5455><code>What is (statistical) arbitrage?</code></font> --> <!-- <!-- ??? Pairs trading is a type of statistical arbitrage. So we need to rewinf a little bit and understand what is statistical arbitrage. and to understand what is statistical arbitrage, we need to first understand what is arbitrage --> <!-- <br> --> <!-- > __Arbitrage__ <br><br> Take advantage of price differences in different markets for the same or different assets --> <!-- <!-- ??? comecar pelo exemplo --> <!-- <br> --> <!-- <br> --> <!-- > __Statistical arbitrage__ <br><br> When we use stats to do arbitrage --> <!-- <!-- ??? we can use stats to identify assets, markets and the best time to trade with stats --> --- class: highlight-last-item ### <code>__Pairs trading:__ <font color =#4C5455>what is it?</font></code> </br> </br> </br> -- + Pairs trading is a investment strategy that exploits temporary mispricings between two assets that historically __move together__. </br> -- + When the pair deviates from its historical norm, investors __`BUY`__ (take a __long__ position in) the undervalued asset and __`SELL`__ (take a __short__ position in) the overvalued one, expecting a reversion of this spread to its historical average or levels. </br> -- + Its main appeal lies in producing a __low-volatility__ and __market-neutral__ investment strategy. <!-- pode ser bonds, contracts, options, stocks etc --> <!-- </br> --> <!-- -- --> <!-- + It was first employed by a quantitative group at Morgan Stanley in the 1980s --> <!-- </br> --> <!-- -- --> <!-- + It belongs to a broader class of investment strategy called statistical arbitrage - statistical modeling of price relationships among different assets to generate excess returns --> --- ###<font color =#4C5455><code>Example of co-moving assets: __Coca-Cola vs. Pepsi__</code></font>
--- ###<font color =#4C5455><code>__Pairs trading__: how to make money from it?</code></font> <img src="index_files/figure-html/unnamed-chunk-3-1.png" width="720" style="display: block; margin: auto;" /> --- class: highlight-last-item ###<font color =#4C5455><code>__Pairs trading__: what are the main challenges?</code></font> <br> <br> > 1. __Identifying pairs__ of securities that exhibit a __stable relationship__ in the desired time frame. <br><br> > 2. Optimal __share allocation__ between the two assets to appropriately hedge against market volatility. <br><br> > 3. Generating __accurate trading signals__ to precisely time the entry and exit points. <br><br> --- class: highlight-last-item ###<font color =#4C5455><code>__Pairs trading__: main adopted strategies</code></font> <br> <br> > Distance Method; > __Cointegration__; > Copulas; > Ornstein-Uhlenbeck; > Machine Learning; > Others. --- ###<font color =#4C5455><code>Cointegration strategy: __pair selection__</code></font> <code> Cointegration test - a two step procedure designed by Engle & Granger (1987): </code> <br> -- > __One:__ Select two stocks, say `\(X\)` and `\(Y\)`, that historically __move together__ and test if the two price series ( `\(y_t\)` and `\(x_t\)` ) are __non-stationary__, i.e. both series have an unit root `\(\gamma = 0\)`; <br><br> -- > __Two:__ Fit a linear model `\(\hat{y_t} = \hat{\beta_0} + \hat{\beta_1} x_t\)`, and test the __residuals__ ( `\(u_t\)` ) for stationarity, i.e., test if `\(u_t\)` does not have an unit root `\(\gamma < 0\)`. <!-- ??? where `\(I(1)\)` denotes an integrated process of order 1, meaning that the series becomes stationary only after taking the first difference. --> <br> -- + If __`(1)`__ and __`(2)`__ hold, then the series are said to be __cointegrated__ - stocks `\(X\)` and `\(Y\)` share a __long-term equilibrium relationship__. --- ###<font color =#4C5455><code>Cointegration strategy: __trading__</code></font> > Trading signals are generated based on the standard deviations of the spread `\(y_t - \hat{\beta_1} x_t\)` .pull-left[ <br> <br> > __Positions:__ whenever the Z-score crosses `\(\mid \pm k\sigma \mid\)` thresholds > __Sizing:__ go long (or short) `\(\hat{\beta_1}\)` dollars of stock `\(X\)` for each dollar of stock `\(Y\)` > __Take-profit:__ when Z-score reverts back to `0` - the long-term average. > __Stop-loss:__ when the Z-score reaches a defined loss margin ( `\(\pm|k\sigma + \xi|\)` ). ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-4-1.png" width="504" style="display: block; margin: auto;" /> ] <!-- ??? LEGAL: O alpha nao precisa entrar no calculo justamente pelo motivo do qual eu distorci o primeiro grafico com log(PEP) - c. Sem o c, o efeito seria o mesmo, ou sejam nao interessa a distancia absoluta entre KO e PEP - o que interesse eh o desvio relativo entre ambas e nao o desvio total considerando o nivel inicial. Alem do fato de que ao icnluirmos alpha, teremos mais uma variavel para estimar tornando o modelo mais complexo e prone to more errors and overfitting--> <!-- ??? falar do zscore e mostrar o grafico 2 (mudar as cores - usar o codigo do palomar) - falar sobre transformar as duas series em um ativo sintetico --> --- ###<font color =#4C5455><code> Proposed methods: __motivation__ </code></font> <code> Problems with the standard cointegration method: </code></br></br> -- > High false positive rates when used within a data mining approach;</br></br> -- > Has been widely adopted by both institutional and individual investors;</br></br> -- > Can result in premature/delayed exits. <!-- ??? alpha level is inflated --> <!-- ??? less opportunities - the margins have shrunk --> <!-- ??? practically speaking, relying only on the zscore of the spread is not very effective -- sometimes the relationship between the securities changes and the zscore does not capture that. Soetimes, it is too sensitive. --> <!-- ??? Competes against high frequency trading to identify distortions ? --> --- ###<font color =#4C5455><code> Proposed methods: __hypotheses__ </code></font> > __Idea:__ `\(\hat{\beta_1}\)` carries important information about the linear relationship between the underlying __co-moving__ securities.</br></br> -- > __Proposal:__ Add a __confirmation layer__ to the standard cointegration strategy in pairs trading by evaluating the behavior of the __hedge ratio__ `\(\hat{\beta_1}\)`.</br></br> -- > __How:__ Deriving the distribution of `\(\hat{\beta}_1\)` from OLS regression through a __parametric__ or __non-parametric method__, and establishing two thresholds within this distribution that will serve as control/confirmation boundaries for the trading strategy.</br></br> -- <code> __Hypotheses:__ </code> > __(i)__ produces more accurate trading signals; > __(ii)__ increases false discovery rates (FDR) in cointegration test. __ `\(^{(***)}\)` __ --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Bayesian Method --- ###<font color =#4C5455><code> Theoretical Background: __hierarchical Bayesian model__ </code></font> <code> Using results from Gelfand _et al._ (1992), Hooten & Hefley (2019), and Rencher (2007), we have: </code> -- <code> __Likelihood:__ </code> .my-style2[ > `$$y_i \sim \mathcal{N}(\boldsymbol{X\beta}, \sigma^2)$$` ] .my-style[ where `\(\boldsymbol{\beta} = \begin{bmatrix} \beta_0 & \beta_1 \end{bmatrix}^\top\)` ] -- <code> __Priors:__ </code> .my-style2[ > `$$\boldsymbol{\beta} \sim \mathcal{TN}(\boldsymbol{\mu_{\beta}} = \mathbf{(X'X)^{-1}X'y}, \boldsymbol{\Sigma_{\beta}} = \sigma_0^2 \mathbf{(X'X)^{-1}}, \phantom{.}0, +\infty)$$` ] .my-style[ where `\(\widehat{\sigma_0^2} = \frac{1}{n-2}\boldsymbol{y'(I - H)y}\)` and `\(\boldsymbol{H = X(X'X)^{-1}X'}\)` ] .my-style2[ > `$$\sigma^2 \sim \mathcal{IG}(q, r)$$` ] -- <code> __Conjugate full-conditional posteriors:__ </code> .my-style2[ > `$$\boldsymbol{\beta} \mid \boldsymbol{y}, \sigma^2 \equiv \mathcal{TN}(\boldsymbol{\mu} = \boldsymbol{A^{-1}b}, \boldsymbol{\Sigma} = \boldsymbol{A^{-1}}, \phantom{.}0, +\infty)$$` ] .my-style[ where `\(\boldsymbol{A} \equiv \boldsymbol{X}^\top (\sigma^2\boldsymbol{I})^{-1} \boldsymbol{X} + \boldsymbol{\Sigma_\beta}^{-1}\)` and `\(\boldsymbol{b} \equiv \boldsymbol{X}^\top (\sigma^2\boldsymbol{I})^{-1} \boldsymbol{y} + \boldsymbol{\Sigma_\beta}^{-1} \boldsymbol{\mu_\beta}\)` ] .my-style2[ > `$$\sigma^2 \mid \boldsymbol{y, \beta} \equiv \mathcal{IG}(\tilde q, \tilde r)$$` ] .my-style[ where `\(\tilde q = q + \frac{n}{2}\)` and `\(\tilde r = \left[\frac{1}{2}\boldsymbol{(y - X\beta)'(y - X\beta)} + \frac{1}{r}\right]^{-1}\)` ] --- ###<font color =#4C5455><code> Bayesian method: __mechanics__ </code></font> <img src="index_files/figure-html/unnamed-chunk-5-1.png" width="864" style="display: block; margin: auto;" /> --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Results for the Bayesian Method --- ###<font color =#4C5455><code> Empirical Results: __data & methodology__ </code></font> <br> <br> -- > Alpha Vantage and MT5 data, 30-minute timeframe; <br><br> -- > 2 years of data: __`02-01-2023 - 01-31-2025`__;<br><br> -- > Backtest consisted of sliding window with 3 periods: formation | decision | trading; <br><br> -- > Parameters: sizes `\(w \in \{65, 130\}\)` and quantile parameter `\(\alpha \in \{0.05, 0.15\}\)`; <br><br> -- > The algorithm was implemented in R --- ###<font color =#4C5455><code> Empirical Results: __backtesting__ </code></font> <iframe src="animated_z_score_with_spike_and_stabilization6.html" width="750" height="520" style="border:none;"></iframe> --- ###<font color =#4C5455><code> Empirical Results: __U.S.__ </code></font> <br> <br> <br> .center[ [__LINK__](./results/US2.html) ] --- ###<font color =#4C5455><code> Empirical Results: __Brazil__ </code></font> <br> <br> <br> .center[ [__LINK__](./results/Brazil2.html) ] --- ###<font color =#4C5455><code> Simulation Study: __pair selection__ </code></font> Performance of the Bayesian method in correctly rejecting false positives. <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:400px; "><table class="table table-striped" style="color: black; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> Beta </th> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> Metric </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> w500 </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> w252 </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> w180 </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> w120 </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> w90 </th> <th style="text-align:left;position: sticky; top:0; background-color: #FFFFFF;"> w60 </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Total </th> </tr> </thead> <tbody> <tr grouplength="3"><td colspan="9" style="border-bottom: 1px solid;"><strong>β₁ = 0.50</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> False Positives </td> <td style="text-align:right;"> 73.00 </td> <td style="text-align:right;"> 36.00 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 71.00 </td> <td style="text-align:right;"> 52.00 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 302.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Correctly Rejected </td> <td style="text-align:right;"> 43.00 </td> <td style="text-align:right;"> 17.00 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 70.00 </td> <td style="text-align:right;"> 35.00 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 235.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Proportion </td> <td style="text-align:right;"> 0.59 </td> <td style="text-align:right;"> 0.47 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.99 </td> <td style="text-align:right;"> 0.67 </td> <td style="text-align:left;"> - </td> <td style="text-align:right;"> 0.78 </td> </tr> <tr grouplength="3"><td colspan="9" style="border-bottom: 1px solid;"><strong>β₁ = 0.75</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> False Positives </td> <td style="text-align:right;"> 73.00 </td> <td style="text-align:right;"> 36.00 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 72.00 </td> <td style="text-align:right;"> 34.00 </td> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 287.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Correctly Rejected </td> <td style="text-align:right;"> 44.00 </td> <td style="text-align:right;"> 19.00 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 72.00 </td> <td style="text-align:right;"> 17.00 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 223.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Proportion </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:right;"> 0.53 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:left;"> 0.5 </td> <td style="text-align:right;"> 0.78 </td> </tr> <tr grouplength="3"><td colspan="9" style="border-bottom: 1px solid;"><strong>β₁ = 1.00</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> False Positives </td> <td style="text-align:right;"> 76.00 </td> <td style="text-align:right;"> 33.00 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 72.00 </td> <td style="text-align:right;"> 68.00 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 319.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Correctly Rejected </td> <td style="text-align:right;"> 43.00 </td> <td style="text-align:right;"> 16.00 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 72.00 </td> <td style="text-align:right;"> 17.00 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 218.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Proportion </td> <td style="text-align:right;"> 0.57 </td> <td style="text-align:right;"> 0.48 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:left;"> - </td> <td style="text-align:right;"> 0.68 </td> </tr> <tr grouplength="3"><td colspan="9" style="border-bottom: 1px solid;"><strong>β₁ = 1.25</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> False Positives </td> <td style="text-align:right;"> 71.00 </td> <td style="text-align:right;"> 37.00 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 71.00 </td> <td style="text-align:right;"> 37.00 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 287.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Correctly Rejected </td> <td style="text-align:right;"> 41.00 </td> <td style="text-align:right;"> 20.00 </td> <td style="text-align:right;"> 70 </td> <td style="text-align:right;"> 70.00 </td> <td style="text-align:right;"> 18.00 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 219.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Proportion </td> <td style="text-align:right;"> 0.58 </td> <td style="text-align:right;"> 0.54 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.99 </td> <td style="text-align:right;"> 0.49 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 0.76 </td> </tr> <tr grouplength="3"><td colspan="9" style="border-bottom: 1px solid;"><strong>β₁ = 1.50</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> False Positives </td> <td style="text-align:right;"> 75.00 </td> <td style="text-align:right;"> 35.00 </td> <td style="text-align:right;"> 72 </td> <td style="text-align:right;"> 71.00 </td> <td style="text-align:right;"> 17.00 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 270.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Correctly Rejected </td> <td style="text-align:right;"> 45.00 </td> <td style="text-align:right;"> 17.00 </td> <td style="text-align:right;"> 72 </td> <td style="text-align:right;"> 70.00 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:left;"> 0 </td> <td style="text-align:right;"> 204.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Proportion </td> <td style="text-align:right;"> 0.60 </td> <td style="text-align:right;"> 0.49 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.99 </td> <td style="text-align:right;"> 0.00 </td> <td style="text-align:left;"> - </td> <td style="text-align:right;"> 0.76 </td> </tr> <tr grouplength="3"><td colspan="9" style="border-bottom: 1px solid;"><strong>Overall Totals</strong></td></tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> False Positives </td> <td style="text-align:right;"> 368.00 </td> <td style="text-align:right;"> 177.00 </td> <td style="text-align:right;"> 352 </td> <td style="text-align:right;"> 357.00 </td> <td style="text-align:right;"> 208.00 </td> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 1465.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Correctly Rejected </td> <td style="text-align:right;"> 216.00 </td> <td style="text-align:right;"> 89.00 </td> <td style="text-align:right;"> 352 </td> <td style="text-align:right;"> 354.00 </td> <td style="text-align:right;"> 87.00 </td> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 1099.00 </td> </tr> <tr> <td style="text-align:left;padding-left: 2em;" indentlevel="1"> </td> <td style="text-align:left;"> Proportion </td> <td style="text-align:right;"> 0.59 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.99 </td> <td style="text-align:right;"> 0.42 </td> <td style="text-align:left;"> 0.33 </td> <td style="text-align:right;"> 0.75 </td> </tr> </tbody> </table></div> --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Non-Overlapping Block Bootstrap --- ###<font color =#4C5455><code> Theoretical Background: __non-overlapping block bootstrap__ </code></font> <code> Inspired in Lahiri (2003), we have: </code> <br> -- <code> __Bootstrap sample__ </code> `\(\leftrightarrow\)` <code> __Original sample__: </code> `$$(W^*_{(j-1)b+1}, W^*_{(j-1)b+2}, \ldots, W^*_{jb}) = B_{I_j} = (W_{(I_j-1)b+1}, W_{(I_j-1)b+2}, \ldots, W_{I_jb})$$` .my-style2[ for `\(j = 1, 2, \ldots, k\)` ] Where: <!-- > j is a specific block; --> > `\(b\)` is the block size; > `\(W\)` represents an observation in each type of sample; > left side represents j-th block from bootstrap sample; > right side represents `\(I_j\)`-th block from original sample > `\(I_1, I_2, \ldots, I_k\)` are random indices indicating which original block will be used to form the j-th bootstrap block; <!-- ??? iterate over the possible values --> --- ###<font color =#4C5455><code> Theoretical Background: __non-overlapping block bootstrap__ </code></font> <br> <br> <br> <code> __Adaptation for partial end blocks:__ </code> `\begin{equation} B_j=\begin{cases}(W_{(j-1)b+1}, W_{(j-1)b+2}, \ldots, W_{jb}), & \text{if } j < k' \\ (W_{(j-1)b+1}, W_{(j-1)b+2}, \ldots, W_n), & \text{if } j = k'\end{cases} \end{equation}` __ where `\(k' = \lceil n/b \rceil\)`. __ --- ###<font color =#4C5455><code> Theoretical Background: __consistency of the NBB procedure for__ </code> `\(\beta_1\)` </font> <code> Using results from Lahiri (2003), Bradley (2005), Hamilton (1994), Skorokhod (1965), and Billingsley (1999), we can verify that </code> `$$\mathbb{P}^* \left( \sqrt{n}(\hat{\beta}_1^* - \hat{\beta}_1) \leq x \right) \stackrel{p}{\to} \mathbb{P} \left( \sqrt{n}(\hat{\beta}_1 - \beta_1) \leq x \right) \quad \text{as } n \to \infty$$` ... after demonstrating that: (1) the bootstrap variance estimator correctly converges to the true variance `\(V\)`; and (2) the bootstrap estimator has the same asymptotic normality as the original OLS estimator `\(\sqrt{n} (\hat{\beta}_1 - \beta_1) \stackrel{d}{\to} \mathcal{N}(0, V)\)`: <br> `$$\text{Var}^*(\sqrt{n}(\hat{\beta}_1^* - \hat{\beta}_1)) \stackrel{p}{\to} V$$` <br> `$$\sqrt{n} (\hat{\beta}_1^* - \hat{\beta}_1) \stackrel{d^*}{\to} \mathcal{N}(0, V)$$` --- ###<font color =#4C5455><code> NBB method: __mechanics__ </code></font> <img src="index_files/figure-html/unnamed-chunk-7-1.png" width="864" style="display: block; margin: auto;" /> --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Results for the NBB Method --- ###<font color =#4C5455><code> Empirical Results: __data & methodology__ </code></font> <br> <br> -- > Alpha Vantage and MT5 data, 30-minute timeframe; <br><br> -- > 2 years of data: __`02-01-2023 - 01-31-2025`__;<br><br> -- > Backtest consisted of sliding window with 3 periods: formation | decision | trading; <br><br> -- > Parameters: sizes `\(w \in \{65, 130\}\)`, quantile parameter `\(\alpha \in \{0.05, 0.15\}\)`, and block size `\(b \in \{13, 26, 39, 65\}\)`; <br><br> -- > The algorithm was implemented in R --- ###<font color =#4C5455><code> Empirical Results: __backtesting__ </code></font> <iframe src="animated_z_score_with_spike_and_stabilization6.html" width="750" height="520" style="border:none;"></iframe> --- ###<font color =#4C5455><code> Empirical Results: __U.S.__ </code></font> <br> <br> <br> .center[ [__LINK__](./results/US3.html) ] --- ###<font color =#4C5455><code> Empirical Results: __Brazil__ </code></font> <br> <br> <br> .center[ [__LINK__](./results/Brazil3.html) ] --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Conclusion --- ###<font color =#4C5455><code> Conclusion: __general remarks & future research__ </code></font> .pull-left[ > Both Bayesian and NBB algorithms significantly outperformed the standard cointegration strategy; <br> > Superior risk metrics: volatility, maximum drawdown, Sharpe, Sortino; <br> > NBB vs. Bayesian: <br> > + fewer parameter sensitivities; <br><br> > + higher returns; ] .pull-right[ > Bayesian: <br> > + Offers a more refined tool for pair selection `\(\to\)` applicability beyond pairs trading. <br> > Both methods innaugurate a family of distribution-based strategies in pairs trading <br> > __Future research:__ <br> > + broader range of markets and asset classes; <br> > + different parameter configurations; <br> > + incorporating transaction costs. ] --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Part II - Equipment Life-cycle Prediction --- ###<font color =#4C5455><code> Introduction: __NBAF project__ </code></font> National Bio and Agro-Defense Facility (NBAF) .pull-left[   ] .pull-right[   ] --- ###<font color =#4C5455><code> Introduction: __reliability theory__ </code></font> <br> <br> -- Connerstone for performance and safety assurance of complex technological systems; <br> <br> -- Has its origins in the mid-XX century driven by military needs; <br> <br> -- Expanded into decision-based framework that prioritize __risk__ `\(\to\)` probabilistic approaches: <br> `\begin{equation} \text{Risk} = \sum_{i} P(E_i) \times C(E_i) \end{equation}` --- ###<font color =#4C5455><code> Proposed method: __motivation__ </code></font> <br> <br> -- > All probabilistic approaches (survival models) pressupose the availability of at least some historical failure data; <br> -- > Critical gap in reliability engineering: innability to generate meaningful predictions in contexts of __complete data absence__;<br> > + barrier to risk-based management in environments where it would be most valuable: __newly commissioned facilities__; __specialized equipment with limited deployment__ `\(\to\)` case study: National Bio and Agro-Defense Facility (NBAF) > + alternatives: accelerated life testing; data pooling; surrogate data; physics-based models `\(\to\)` all insufficient. --- ###<font color =#4C5455><code> Proposed method: __motivation__ </code></font> <br> <br> <br> > Bayesian framework initially offers a natural solution to the data-scarcity problem; <br> <br> > + Encodes expert judgement, theoretical understanding, and manufacturer specifications into the priors; <br> > + __But__ we still need failure data into the likelihood; <br><br> -- > __Previous Idea__:<br> > + Johnson _et. al._ (2005): hierarchical Bayesian model to estimate early reliability using borrowed data from comparable systems `\(\to\)` __problematic__ `\(\to\)` failure mechanisms are highly context-dependent. --- ###<font color =#4C5455><code> Proposed method: __unorthodox hierarchical Bayesian method__ </code></font> > __Observation:__ Bayesian models are capable of automatically balancing prior beliefs and evidence from data.</br></br> -- > __Proposal:__ Create a hierarchical Bayesian model that heavily relies on expert and other operational characteristics before failure data is collected. </br></br> <!-- ??? As new data enters the system, the model will balance itself automatically without the need of recalibration.</br></br> --> -- > __How:__ > + `First`: simulate the likelihood function using equipment operational information and synthetizing theoretical considerations, physical constraints, and domain knowledge about failure mechanisms. > + `Second`: conduct meticulous prior elicitation based on readily available operational variables <!-- such as manufacturer-specified expected lifetime, equipment runtime, maintenance quality, and system criticality. --> -- <code> __Hypotheses:__ </code> > __(i)__ establishes a formal mechanism for transforming qualitative information into quantifiable probabilistic statements; > __(ii)__ enables model to learn from operational data that precedes failure <!-- ??? creates a strucuture for continuously refining predictions based on evolving operational information, even before actual failure occurs --> --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Implementation --- ###<font color =#4C5455><code> Model: __Bayesian hierarchical model__ </code></font> <br> <br> <br> For each equipment `\(i\)`, we have: <br> <br> `\begin{equation} t_i \sim \mathit{Wei}(k_i, \lambda_i) \\ \text{ }\\ k_i \sim \mathcal{LN}(\mu_i, \sigma_i^2) \\ \text{ }\\ \lambda_i \sim \Gamma(\alpha_{i}, \beta_{i}) \end{equation}` --- ###<font color =#4C5455><code> Likelihood: __simulated Weibull__ </code></font> `\begin{equation} h(t) = \frac{\hat{k}_i}{\hat{\lambda}_i} \left(\frac{t}{\hat{\lambda}_i}\right)^{\hat{k}_i -1}, \quad t > 0, \hat{k}_i > 0, \hat{\lambda}_i > 0 \end{equation}` > We rely on prior knowledge and equipment operational characteristics.<br><br> > __ `\(\hat{k}_i\)` __ (shape parameter):</br> > + recently installed equipment: `\(\hat{k}_i < 1\)` `\(\to\)` early stage failures</br> > + otherwise: `\(\hat{k}_i > 1\)` `\(\to\)` aging and wear-out failures</br> > __ `\(\hat{\lambda}_i\)` __ (scale parameter or _characteristic life_):</br> > + average or median median of the lifetimes provided by the manufacturer for a specific group of similar equipment --- ###<font color =#4C5455><code> Priors: __Gamma__ </code></font> > Mapped variable: __expected life__ `\(\xi_i\)` (in years) > Mapping functions: .pull-left[ `\begin{equation} \alpha_i = \frac{1}{\Delta_i^2} \\ \beta_i = \frac{\alpha_i}{\xi_i} \end{equation}` > `\(\Delta_i\)` measures the dispersion (uncertainty) between manufacturer specifications and engineering assessments for the expected life ] .pull-right[ `\begin{equation} \Delta_i = \max\left(\frac{|\xi_i - \hat{\mu}_{\text{eng},i}|}{\xi_i}, c\right) \end{equation}` `\begin{equation} \hat{\mu}_{\text{eng},i} = \frac{a_i + b_i}{2} \end{equation}` > `\(c > 0\)` avoids `\(\Delta_i = 0\)`; > `\(\hat{\mu}_{\text{eng},i}\)` is the midpoint of the estimated minimum and maximum life values of the engineer. > `\(a_i\)` minimum estimated life and `\(b_i\)` maximum estimated life for the equipment. ] --- ###<font color =#4C5455><code> Gamma prior: __sensitivity to input variables__ </code></font> <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="720" style="display: block; margin: auto;" /> --- ###<font color =#4C5455><code> Priors: __Log-normal__ </code></font> <br> <br> > Mapped variables: __age, runtime, maintenance quality__, and __criticality__ as a proportion `\(x\)` of the expected life, i.e. [0,1]. > Mapping functions: <br> `\begin{equation} \sigma_{k_i}^2 = x^2 \\ \label{log-mu} \mu_{k_i} = \log(\hat{k}_i) \times C - \frac{\sigma_{k_i}^2}{2} \end{equation}` > `\(\hat{k}_i\)` represents the baseline failure rate for the analyzed equipment; > `\(x\)` is the proportion of the expected life for the given input variable; > `\(C\)` is a scaling factor to make the survival curve more or less responsive. --- ###<font color =#4C5455><code> Log-normal prior: __sensitivity to input variables__ </code></font> <img src="index_files/figure-html/unnamed-chunk-9-1.png" width="720" style="display: block; margin: auto;" /> --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Results --- ###<font color =#4C5455><code> Results: __NBAF App__ </code></font> <br> <br> <br> .center[ [__LINK__](https://allanvc.shinyapps.io/nbaf-app) ] --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Conclusions --- ###<font color =#4C5455><code> Conclusions </code></font> -- > The proposed framework effectively circumvents the problem of scarce historical data; <br><br> > It integrates well with reliability theory; <br> <br> -- > Restrictions:<br> > + highly dependent on prior beliefs;<br> > + disproportional weight to expected life parmeter (prior and likelihood); <br> <br> -- > Future research: <br> > + Improve prior elicitation and validation;<br> > + Room to incorporate other environmental and operational factors;<br> > + Expand the hierarchical levels of the model;<br> > + Simulation and further empirical validation --- class: inverse, center, middle <!-- title-slide-section-grey, --> ## Thank you!